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ABSTRACT 

\ 

‘  It  is  argued  that  the  specification  of  problems  of  experimental  design 
(and  in  particular,  of  response  surface  design)  should  depend  on  scientific 
context.  The  specification  for  a  widely  developed  theory  of  "alphabetic 
optimality"  for  response  surface  applications  is  analyzed  and  found  to  be 
unduly  limiting.  Ways  in  which  designs  might  be  chosen  to  satisfy  a  set  of 
criteria  of  greater  scientific  relevance  are  suggested.  Detailed  considera¬ 
tion  is  given  to  regions  of  operability  and  interest,  to  the  design  information 
function,  to  sensitivity  of  criteria  to  size  and  shape  of  the  region,  and  to 
the  effect  of  bias.  Problems  are  discussed  of  checking  for  lack  of  fit, 
sequential  assembly,  orthogonal  blocking,  estimation  of  error,  estimation  of 
transformations,  robustness  to  bad  values,  using  minimum  numbers  of  points, 
and  employing  simple  data  patterns. 
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SIGNIFICANCE  AND  EXPLANATION 


Response  surface  methods  for  investigating  empirical  relationships,  and 

\ 

in  particular  for  finding  the  experimental  conditions  for  which  some  response 
is  maximized,  were  introduced  some  thirty  years  ago.  Since  that  time  these 
methods  have  been  widely  applied,  and  much  research  has  gone  into  improving 
them,  in  particular  the  selection  of  suitable  experimental  designs.  One 
prolific  line  of  mathematical  research  has  concerned  particular  optimality 
criteria  (e.g.  D,  G,  A,  and  E  optimality)  called  here  "alphabetic  optimality." 
The  assumptions  and  specifications  which  motivate  alphabetic  optimality  are 
discussed,  and  found  to  be  unduly  limiting  so  far  as  response  surface  design 
is  concerned.  An  approach  having  greater  scientific  relevance  is  discussed. 
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The  responsibility  for  the  wording  and  views  expressed  in  this  descriptive  summary 
lies  with  MRC,  and  not  with  the  author  of  this  report. 


George  E.  P.  Box 


University  of  Wisconsin  -  Madison 


1 .  INTRODUCTION 

There  seems  no  doubt  that  of  all  the  activities  in  which  the 
statistician  can  engage,  that  of  designing  experiments  is  by  far  the  most 
important,  since  it  is  here  that  the  actual  mode  of  generation  of  scientific 
data  is  decided. 

The  importance  of  practice  in  guiding  the  development  of  the  theory  of 
experimental  design  [45]  is  clearly  seen  from  the  time  of  its  invention. 
Fisher  was  engaged  by  Russell  [16]  on  a  temporary  basis  at  Rothamsted 
Experimental  Station  in  1919  "to  examine  our  data  and  elicit  further 
information  that  we  had  missed."  Records  were  available  from  the  ongoing 
Broadbalk  experiment  in  which  particular  combinations  of  fertilizers  had  been 
consistently  applied  to  13  plots  for  a  period  of  almost  70  years.  In  his 
analysis  ([22],  [24]),  Fisher  attempted  to  relate  yield  to  fertilizer 
combination,  to  weather,  and  in  particular,  to  rainfall.  The  method  he  used 
was  multiple  regression  with  distributed  lag  models,  involving  an  ingenious 
employment  of  orthogonal  polynomials  which  led  to  important  advances  in  the 
theory  of  regression  analysis,  and  in  particular  its  distribution  theory. 
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With  only  the  crudest  of  computational  aids,  the  work  must  have  been 
burdensome,  making  it  all  the  more  frustrating  to  discover  that,  however 
ingenious  the  analysis,  the  inherent  nature  of  the  data  ensured  that  the 
answers  to  many  questions  were  inaccessible.  A  comprehension  of  the  logical 
problems  in  drawing  conclusions  from  such  analyses  led  naturally  to 
speculation  on  how  some  of  the  difficulties  might  be  overcome  by  appropriate 
design.  These  ideas  were  further  stimulated  by  the  Analysis  of  Variance, 
which  Fisher  introduced  in  1923  with  W.A.  Mackenzie  [23]  for  the  elucidation 
of  what  was  clearly  a  most  unsatisfactory  design  which  he  had  had  no  part  in 
choosing.  Thereafter,  as  Fisher  gradually  acquired  more  influence  in  the 
setting  up  of  field  trials,  the  principles  of  replication,  randomization  and 
their  application  to  randomized  blocks,  latin  squares  and  factorial  designs 
quickly  evolved  out  of  the  actual  planning,  running,  and  analysis  of  a  series 
of  experimental  designs  of  increasing  complexity  and  beauty. 

The  practical  context  of  scientific  experimentation  continued  to 
produce  important  theoretical  advances  when  Yates  came  to  Rothamsted  in  1931, 
leading  in  particular  to  important  developments  in  the  design  and  analysis  of 
complex  factorial  designs  and  their  associated  systems  of  confounding  ([44], 
[46] )  and  to  the  introduction  of  incomplete  block  designs. 

My  own  experience  with  experimental  design  began  during  the  Second 
World  War.  I  worked  at  the  Chemical  Defense  Experimental  station  in  England 
with  a  group  of  medical  research  workers  who  were  attempting,  using  animals 
and  volunteers,  to  find  ways  to  combat  the  effects  of  poison  gas  and  other 
toxic  agents.  At  this  time  it  was  believed  that  these  agents  might  be  used 
not  only  against  the  military,  but  also  against  the  civilian  population.  It 
was  important  therefore  that  our  work  should  progress  as  rapidly  as 
possible.  I  found  myself  a  part  of  evolving  investigations  which  employed 
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seqaences  of  experiments  which  I  designed  and  whose  nature  needed  to  adapt  to 
changing  needs  at  different  stages  of  the  study.  The  designs  employed  were 
randomized  blocks,  balanced  incomplete  blocks,  latin  squares,  and 
factorials.  Later,  during  my  eight  years  as  a  statistician  with  Imperial 
Chemical  Industries,  my  role  was  again  as  a  member  of  various  scientific  teams 
tackling  evolving  problems  with  sequences  of  designs.  Many  of  the  problems 
were  similar  to  those  I  had  previously  encountered,  and  again  employed  the  (by 
now)  standard  designs  of  Fisher  and  Yates.  However,  some  investigations 
directly  concerned  with  the  improvement  of  chemical  processes  at  the  lab, 
pilot  plant,  and  full  scale,  seemed  to  require  additional  methods,  which 
however,  still  drew  on  the  fundamental  principles  laid  down  by  the  originators 
of  experimental  design.  This  led  to  the  development  of  what  has  come  to  be 
called  response  surface  methodology.  See  for  example  [4],  [14],  [15],  [30], 
[31],  and  [39]. 

Suppose  some  response  n  of  interest  is  believed  to  be  locally 
approximated  by  a  polynomial  of  low  degree  in  k  continuous  experimental 
variables  x  =  (X1  »  X2»  •  X^)'*  To  such  a  function  we  need  appropriate 

experimental  designs.  Let  uj  call  a  design  suitable  for  estimating  a  general 
polynomial  of  degree  d  a  dth  order  design  in  k  variables.  Thus  a  design 
suitable  for  fitting  the  function 


2  2 

n  =  8  +  8  v  +  6y  +  8  y  +  8  x  +  6  yy 

0  1*1  2*2  11*1  22*2  12*1*2 


would  be  a  second  order  design  in  two  variables. 

One  route  for  choosing  such  designs,  which  has  generated  an  enormous 
amount  of  mathematical  research  over  the  last  twenty  or  so  years,  we  shall 
refer  to  as  the  "alphabetic  optimality"  approach.  For  reasons  I  will  explain. 


I  have  reservations  about  the  usefulness  of  this  approach  so  far  as  response 
surface  designs  are  concerned.  For  completeness,  a  brief  summary  of  some  of 
the  main  ideas  are  set  out  below  <[32],  [33],  [34],  [35],  [36],  [42],  [43]). 


-4- 


2.  SOME  ASPECTS  OF  OPTIMAL  DESIGN  THEORY 
FOR  CONTINUOUS  EXPERIMENTAL  VARIABLES 

Consider  a  response  h  which  is  supposed  to  be  an  exactly  known 
function  n  =  x'B  linear  in  p  coefficients  B»  where 

x  =  {f,<X),  f2<X>,  ...,  f  (X))*  is  a  vector  of  p  functions  of  k  experimental 

variables  x»  Suppose  a  design  is  to  be  run  defining  n  sets  of  k  experimental 

conditions  given  by  the  n  x  k  design  matrix  (x^)  and  yielding  n  observations 

{y  },  so  that 
u 


Hu  *  x^B  (u  =  1,  2,  ...,  n) 

where  yu  -  is  distributed  N<0,02)  and  the  n  x  p  matrix  X  »  {x^}. 

The  elements  of  (c_)  =  (X'X)  1  are  proportional  to  the  variances  and 

A 

covariances  of  the  least  squares  estimates  B.  Within  this  specification,  the 
problem  of  experimental  design  is  that  of  choosing  the  design  (x  }  so  that 
the  elements  c^  are  to  our  liking.  Because  there  are  V/P(P+1 )  of  these, 
simplification  is  desirable. 

A  motivation  for  simplification  is  provided  by  considering  the 
confidence  region1  for  B 


Obviously  there  are  also  parallel  fiducial  and  Bayesian  rationalizations. 
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(8  -  8)'X'X{8  -  8)  =  constant 

defining  an  ellipsoid  in  p  parameters.  The  eigenvalues  A^ ,  A^,  . ..,  A^  of 

(X'X)  1  are  proportional  to  the  squared  lengths  of  the  p  principal  axes  of 

this  ellipsoid.  Suppose  their  maximum,  arithmetic  mean,  and  geometric  mean 

are  indicated  by  A  .A.  and  A.  Then  it  is  illuminating  to  consider  the 
max 

transformation  of  the  ^  p(p+1 )  elements  c^j  to  a  corresponding  number  of 
items  as  follows: 


(i)  D  =  | X * X |  =  A  p  (so  that  D-1/2  =  AP/^2  is  proportional  to 

the  volume  of  the  confidence  ellipsoid). 

(ii)  H,  a  vector  of  p  -  1  homogeneous  functions  of  degree  zero  in 

the  A’s,  which  measure  the  non-sphericity  or  state  of  ill- 

conditioning  of  the  ellipsoid.  In  particular  we  might  choose,  for 

two  of  these,  H  *  A/A  and  H.  =  A  /A  ,  both  of  which  would 
1  2  max 

take  the  value  unity  for  a  spherical  region. 

(iii)  V^p(p-1)  independent  direction  cosines  which  determine  the 
orientation  of  the  orthogonal  axes  of  the  ellipsoid. 


It  is  traditionally  assumed  that  the  V^P^P-I)  elements  concerned  with 
orientation  of  the  ellipsoid  are  of  no  interest,  and  attention  has  been 
concentrated  on  particular  criteria  which  measure  in  some  way  or  another  the 
sizes  of  the  eigenvalues,  measuring  some  combination  of  size  and  sphericity  of 
the  confidence  ellipsoid.  Among  these  criteria  are 


|  X '  X  |  =  IIA'1  =  A"p 

£A .  =  tr(X'X)  1  =  a  ^Zvar(8.  ) 

i  -  -  i 

maxlA^}  =  Ah2 
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The  desirability  of  a  design,  as  measured  by  the  D,  A,  and  E  criteria, 
increases  as  A,  Ah^,  and  Ah2  respectively,  are  decreased.  But  in  practical 
situations,  each  of  these  criteria  will  take  smaller  and  hence  more  desirable 
values  as  the  ranges  for  the  experimental  variables  x  are  taken  larger  and 
larger.  To  cope  with  this  problem  it  is  usually  assumed  that  the  experimental 
•'ariables  Xu  may  vary  only  within  some  exactly  known  region  in  the  space  of 
X,  but  not  outside  it.  I  will  call  this  permissible  region  RO. 

Another  characteristic  of  the  problem  which  makes  its  study 
mathematically  difficult  is  the  necessary  discreteness  of  the  number  of  runs 
which  can  be  made  at  any  given  location.  In  a  technically  brilliant  paper 
[37],  Kiefer  and  Wolfowitz  dealt  with  this  obstacle  by  introducing  a 
continuous  design  measure  £  which  determines  the  proportion  of  runs  which 
should  ideally  be  made  at  each  of  a  number  of  points  in  the  x  space. 
Realizable  designs  which  most  nearly  approximated  the  optimal  distribution 
could  then  be  used  in  practice. 

A  further  important  result  of  Kiefer  and  Wolfowitz  linked  the  problem 
of  estimating  0  with  that  of  estimating  the  response  n  via  the  property  of 
"G-optimality."  G-optimal  designs  were  defined  as  those  which  minimized  the 

A 

maximum  value  of  V(y  )  within  RO.  The  authors  were  then  able  to  show,  for 

x  - 

their  measure  designs,  the  equivalence  of  G-  and  D-optimality.  Furthermore, 

they  showed  that,  for  such  a  design,  within  the  region  RO,  the  maximum  value 
*  2 

of  n*Var(yx)/o  was  p,  and  that  this  value  was  actually  attained  at  each  of 
the  design  points. 

For  illustration  we  consider  a  second  order  measure-design  in  two 
variables;  that  is,  a  design  appropriate  for  the  fitting  of  the  second  degree 
polynomial  of  equation  (1).  Such  a  design  which  is  both  D-  and  G-  optimal  for 
a  square  region  RO  with  vertices  (±1,±1)  was  given  by  Fedorov  [21]  (see 


also  Herzberg  [27]).  The  design  places  14.6%  of  the  measure  at  each  of  the 
four  vertices,  8.0%  at  each  of  the  midpoints  of  the  edges,  and  9.6%  at  the 
origin.  The  design  is  set  out  in  Figure  4(b). 

While  this  approach  has  generated  much  interesting  mathematics,  it  does 
not,  I  believe,  solve  the  problem  of  choosing  good  response  surface  designs. 

In  the  hope  of  stimulating  new  initiative,  I  have  set  out  below  what  I  believe 
is  the  scientific  context  for  response  surface  studies  and  indicated  some 
possible  lines  of  development. 

3.  THE  RESPONSE  SURFACE  CONTEXT- 

As  an  example  suppose  it  is  desired  to  study  some  chemical  system,  with 
the  object  of  obtaining  a  higher  value  for  a  response  n  such  as  yield  which 
is  initially  believed  to  be  some  function  h  =  g(x)  of  k  continuous  input 
variables  x  =  (X1 ,X2» • • • ,X^ ) '  such  as  reaction  time,  temperature,  or 
concentration.  As  is  illustrated  in  Figure  1 ,  it  is  usually  known  initially 
that  the  system  can  be  operated  at  some  point  XQ  in  the  space  of  x  and  is 
expected  to  be  capable  of  operating  over  some  much  more  extensive  region 
O  called  the  operability  region,  which  however  is  usually  unknown  or  poorly 
known.  Response  surface  methods  are  employed  when  the  nature  of  the  true 
response  function  h  =  g(x)  is  also  unknown2 3  or  is  inaccessible. 

2  One  secondary  object  of  the  investigation  may  be  to  find  out  more  about  the 
operability  region  O. 

30ccasionally  the  true  functional  form  n  =  g(X)  may  be  known,  or  at  least 
conjectured,  from  knowledge  of  physical  mechanisms.  Typically  however  g(x) 
will  then  appear  as  a  solution  of  a  set  of  differential  equations  which  are 
nonlinear  in  a  number  of  parameters  which  may  represent  physical  constants. 
Problems  of  nonlinear  experimental  design  then  arise  which  are  of  considerable 
interest  although  they  have  received  comparatively  little  attention  (see  for 
example  [13],  [18],  [25]). 
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Suppose  that  over  some  (typically  much  less  extensive)  immediate  region 

of  interest  R  in  the  neighborhood  of  xQ  it  is  guessed  that  a  “graduating" 

function,  such  as  a  dth  degree  polynomial  in  x, 

n  =  x '  B 
X 

might  provide  a  locally  adequate  approximation  to  the  true  function 

h  =  g(x)  where  as  before  x  is  a  p-dimensional  vector  of  suitably 

transformed  input  variables  x'  =  {f  (x ), f_(  X  ),...,  i  (X)K  and  $  is  a  vector 

—  *  ~  2.  P—  "• 

of  coefficients  occurring  linearly  that  may  be  adjusted  to  approximate  the 

unknown  true  response  function  =  g(x).  Then  progress  may  be  achieved  by 

using  a  sequence  of  such  approximations.  For  example  when  a  first  degree 

polynomial  approximation  could  be  employed  it  might,  via  the  method  of 

steepest  ascent,  be  used  to  find  a  new  region  of  interest  R1  where,  say,  the 

yield  was  higher.  Also  a  maximum  in  many  variables  is  often  represented  by 

4 

some  rather  complicated  ridge  system  and  a  second  degree  polynomial 
approximation  when  suitably  analysed  might  be  used  to  elucidate,  describe,  and 
exploit  such  a  system. 

Thus  we  are  typically  involved  in  using  a  sequence  of  designs,  each 
making  use  of  information  gleaned  from  earlier  experiments  —  a  characteristic 
typical  of  a  much  wider  field  of  scientific  investigation.  This  provides  the 


4 

Empirical  evidence  suggests  this.  Also,  integration  of  sets  of  differential 
equations  which  describe  the  kinetics  of  chemical  systems  almost  invariably 
leads  to  ridge  systems  ((41,  (151,  (2b),  (41]).  See  also  the  discussion  of 

Figure  d. 
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opportunity  to  progressively  improve  not  only  the  objective  function  n 
directly,  but  also  the  mode  of  gathering  information  about  it.  For  example, 
at  the  J/th  stage,  a  design  performed  in  a  region  may  suggest  that  a  new 
region  R  is  worthy  of  investigation  (either  because  it  can  be  expected  to 
give  higher  values  of  h  or  because  it  may  throw  light  on  other  important 
aspects  of  the  function).  But  this  new  region  may  be  different  not  only  in 
(a)  its  location  in  the  space  of  X#  but  (b)  in  its  shape  also  (for  instance 
because  of  information  fed  back  from  previous  data  on  transformations  of 
X's  individually  or  jointly),  and  (c)  in  the  identity  of  its  component  space 
(because  of  feedback  from  the  results  themselves,  indicating  that  certain 
variables  should  be  dropped,  and/or  that  new  variables  should  be  added).  Thus 
in  any  realistic  view  of  the  process  of  investigation  the  dimensions, 
identity,  location  and  metrics  of  measurement  of  regions  of  interest  in  the 
experimental  space  are  all  iteratively  evolving.  The  problem  of  choosing 
suitable  experimental  designs  in  such  a  context  is  a  difficult  one.  Some 
properties  ([5],  [8])  of  a  response  surface  design,  any,  all  or  some  of  which 
might  in  different  circumstances  be  of  importance  in  the  above  context  are 
given  in  Table  1 . 

The  design  information  function 

Associated  with  requirements  (1)  and  (2)  of  Table  1,  consider  the 
design  variance  function  [11] 

*  2  -1 
V  =  n»Var(y  )/o  =  nx'(X'X)  x 
X  X  -  —  -  - 

or  equivalently  the  Information  Function 


L 


The  design  should: 

(i)  generate  a  satisfactory  distribution  of  information  throughout  the  region  of 
interest,  It; 

(ii)  ensure  that  the  fitted  value  at  x,  $(x)  be  as  close  as  possible  to  the  true  value  at  x, 

v(*y. 

(iii)  give  good  detectability  of  lack  of  fit; 

(iv)  allow  transformations  to  be  estimated; 

(v)  allow  experiments  to  be  performed  in  blocks; 

(vi)  allow  designs  of  increasing  order  to  be  built  up  sequentially; 

(vii)  provide  an  internal  estimate  of  error; 

(viii)  be  insensitive  to  wild  observations  and  to  violation  of  the  usual  normal  theory 
assumptions; 

(ix)  require  a  minimum  number  of  experimental  points; 

(x)  provide  simple  data  patterns  that  allow  ready  visual  appreciation; 

(xi)  ensure  simplicity  of  calculation; 

(xii)  behave  well  when  errors  occur  in  the  settings  of  the  predictor  variables,  the  x’s; 
(xiii)  not  require  an  impractically  large  number  of  predictor  variable  levels; 

(xiv)  provide  a  check  on  the  ‘constancy  of  variance’  assumption. 


TABLE  1. 


SOME  ATTRIBUTES  OF  DESIGNS  OF  POTENTIAL  IMPORTANCE 
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It  is  evident  that  if  we  were  to  make  the  unrealistic  assumption  (made  in 
alphabetic  optimality)  that  the  graduating  function  n  =  x'8  is  capable  of 
exactly  representing  the  true  function  g(X)  ,  then  the  information  function 
would  tell  us  all  we  could  know  about  the  design's  ability  to  estimate  h. 

For  illustration,  information  functions  and  associated  information  contours 
for  a  2  factorial  used  as  a  first  order  design  and  for  a  3  factorial  used  as 
a  second  order  design  are  shown  in  Figures  2  and  3,  for  standard  variables  x1 
and 


4.  APPLICABILITY  OF  ALPHABETIC  OPTIMALITY 


The  information  function  for  Fedorov's  second  order  D/G-optimal  design 
over  the  permissible  RO  region  (±1,±1),  referred  to  earlier,  is  shown  in 
Figure  4.  For  illustration,  this  is  related  to  the  two  experimental  variables 
X1  =  temp  in  °C  and  x2  “  time  in  hours.  Thus,  in  this  particular  example, 

X1  =  (xi  -  180)/10,  x2  =  X2  ”  4  and  the  R0  re9ion  would  permit 
experimentation  within  the  limits  X1  =  170  -  190  °C  and  x2  =  3  -  5  hours, 
but  not  outside  these  limits.  In  the  response  surface  context  a  number  of 
questions  arise  concerning  the  appropriateness  of  the  specification  set  out  in 
Section  2  of  this  paper  for  alphabetic  optimality.  These  concern 

(i)  Formulation  in  terms  of  the  RO  region 
(ii)  Distribution  of  information  over  a  wider  region 
(lii)  Sensitivity  of  criteria  to  size  and  shape  of  the  RO  region 


(iv)  Ignoring  of  bias 
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Formulation  in  terms  of  the  RO  region 

As  has  been  pointed  out,  in  response  surface  studies  it  is  typically 

true  that  at  any  given  stage  of  an  investigation  the  current  region  of 

interest  R  is  much  smaller  than  the  region  of  operability  0  which  is,  in  any 

case,  usually  unknown.  In  particular,  it  is  obvious  that  this  must  be  so  for 

any  investigation  in  which  we  allow  tha  possibility  that  results  of  one  design 

may  allow  progress  to  a  different  unexplored  region.  Consequently  I  believe 

that  formulation  in  terms  of  an  RO  region  which  assumes  that  R  and  0  are 

identical  is  artificial  and  limiting.  In  particular,  to  obtain  a  good 

approximation  within  R  one  may  very  well  wish  to  put  some  experimental 

points  outside  R  and  so  long  as  they  are  within  O  there  is  no  practical 

reason  why  we  should  not.  Also  since  typically  R  is  only  vaguely  known,  we 

will  want  to  consider  the  information  function  over  a  wider  region,  as  is  done 

for  example  in  Figure  5  for  Fedorov's  second  order  D-optimal  design.  The 

information  function  for  this  design  may  now  be  compared  over  this  wider 

2 

region  with  that  for  the  3  factorial  in  Figure  3. 

Distribution  of  information  over  a  wider  region 


In  the  response  surface  context,  the  coefficients  8  of  a  graduating 
function  =  x’8  acting  as  they  do  merely  as  adjustments  to  a  kind  of 
mathematical  french  curve  are  not  usually  of  individual  interest  except 
insofar  as  they  affect  n,  in  which  case  only  the  G-optimality  criterion  among 
those  considered  is  of  direct  interest.  For  response  surface  studies  however. 


it  is  far  from  clear  how  desirable  is  the  property  of  G-optimality  itself 


Figure  5.  Information 
function  for  a  second 
order  D/G-optimal  design 
over  a  wider  region. 


For  instance,  the  profiles  of  Figure  6  made  by  taking  section*  of  the 

surfaces  of  Figure  3  and  Figure  5  suggest  that  neither  the  G/D-optimal  design 
2 

nor  the  3  design  are  universally  superior  one  to  the  other.  In  some 
subregions  one  design  is  slightly  better,  and  in  others  the  other  design  is 
slightly  better.  Both  information  functions,  and  particularly  that  of  the 
G/D-optimal  design,  show  a  tendency  to  sag  in  the  middle.  This  happens  for 
the  G/D-optimal  design  because  the  G-optimality  characteristic  guarantees  that 
(maximized)  minima  for  I  ,  each  equal  to  1/P»  occur  at  every  design  point, 
which  must  include  the  center  point.  However,  this  sagging  information 
pattern  of  the  second  order  design  is  not  of  course  a  characteristic  of  the 
first  order  design  of  Figure  2  which  is  also  D/G-optimal  but  contains  no 
center  point.  If  the  idea  of  the  desirability  of  designs  possessing  a 
particular  kind  of  information  profile  is  basic,  then  it  seems  unsatisfactory 
that  the  nature  of  that  profile  should  depend  so  very  much  on  the  order  of  the 
design.  Indeed,  the  relevance  of  the  minimax  criterion  which  produces  G- 
optimality  is  arguable.  It  follows  from  the  Kief er-Wolfowi tz  theorem  that  a 
second  order  design  for  the  (±1,±1)  region  whose  information  function  did  not 
sag  in  the  middle  would  necessarily  not  be  D-optimal.  But  as  we  have  seen,  D- 
optimality  is  only  one  of  many  single-valued  criteria  that  might  be  used  in 
attempts  to  describe  some  important  characteristic  of  the  X'X  matrix.  Others 
for  example  would  be  A-optimality  and  E-optimality,  and  these  would  yield 
different  information  profiles.  But  I  would  argue  that  since  the  information 
function  itself  is  the  most  direct  measure  of  desirability  so  far  as  the 
single  issue  of  variance  properties  is  concerned,  our  best  course  is  to  choose 
our  design  directly  by  picking  a  suitable  information  function,  and  not 
indirectly  by  finding  some  extremum  for  A,  E,  D,  or  other  arbitrary  criterion. 
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Sensitivity  of  criteria  to  size  and  shape 

In  the  process  of  scientific  investigation,  the  investigator  and  the 
statistician  must  do  a  great  deal  of  guesswork.  In  matching  the  region  of 
interest  R  and  the  degree  of  complexity  of  the  approximating  function,  they 
must  try  to  take  into  account,  for  example,  that  a  more  flexible  second  degree 
approximating  polynomial  can  be  expected  to  be  adequate  over  a  larger  region 
R  than  a  first  degree  approximation.  Obviously  different  experimenters  would 
have  different  ideas  of  appropriate  locations  and  ranges  for  experimental 

variables.  In  particular,  ranges  could  easily  differ  from  one  experimenter  to 

5 

another  by  a  factor  of  two  or  more  .  In  view  of  this,  extreme  sensitivity  of 
design  criteria  to  scaling  is  disturbing6.  For  example,  suppose  each 
dimension  of  a  dth  order  experimental  design  is  increased7  by  a  factor  c.  Then 

the  D  criterion  is  increased  by  a  factor  of  c^  w^ere 


Over  a  sequence  of  designs,  initial  bad  choices  of  scale  and  location  would 
tend  to  be  corrected,  of  course. 

6  In  particular,  designs  can  only  be  fairly  compared  if  they  are  first  scaled 

to  be  of  the  "same  size."  But  how  is  size  to  be  measured?  It  was  suggested 

in  [14]  that  designs  should  be  judged  as  being  of  the  same  size  when  their 

_  o 

marginal  second  moments  E(x^u  ~  x  )  /n  were  identical.  This  convention  is 
not  entirely  satisfactory,  but  will  of  course  give  very  different  results  from 
those  which  assume  design  points  to  be  all  included  in  the  same  region  RO.  It 
is  important  to  be  aware  that  the  apparent  superiority  of  one  design  over 
another  will  often  disappear  if  the  method  of  scaling  the  design  is  changed. 

In  particular  this  applies  to  comparisons  such  as  those  made  by  Nalimov  et  al 
[40]  and  Lucas  [38]. 

7  A  measure  of  efficiency  of  a  design  criterion  (see  for  example  [3],  [17])  is 
motivated  by  considering  the  ratio  of  the  number  of  runs  necessary  to  achieve 
the  optimal  design  to  the  number  of  runs  required  for  the  suboptimal  design  to 
obtain  the  same  value  of  the  criterion  (supposing  fractional  numbers  of  runs 
to  be  allowed).  In  particular  for  the  D  criterion,  this  measure  of  D- 
efficiency  is  ( 0/ROpt ) 1 ^ •  Equi valently  here,  to  illustrate  scale 
sensitivity,  we  concentrate  attention  on  the  factor  c  by  which  each  scale 
would  need  to  be  inflated  to  achieve  the  same  value  of  the  0  criterion. 
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2k (k+d  ) ! 


(k  +  1  ) ! (d-1  )! 


Equivalently  a  confidence  region  of  the  same  volume  as  that  for  a  D-optimai 

design  can  be  achieved  for  a  design  of  given  i)  value  by  increasing  the  scale 

for  each  variable  by  a  factor  of  c  =  (D  /D)1^,  thus  increasing  the  volume 

opt 

R  k  /o 

occupied  by  the  design  in  the  x  space,  by  a  factor  c  =  (D  /0)  '  .  For 

op  t 

2  -  2 

example  the  D  value  for  the  3  factorial  design  of  Figure  3  is  0.98  x  10  as 

_  o 

comparer  with  a  D  value  of  1.14  x  10  1  for  the  D-optimal  design.  For  (k  =  2, 
d  =  2),  we  find  q  =  16,  and  c  =  ( 1 . 1 4/0. 98 ) 1 ^ 1 **  =  1.009.  Thus  the  same  value 

of  D  (tne  same  volume  of  a  confidence  region  for  the  0's)  as  is  obtained  for 

2 

the  D-optimal  design  would  be  obtained  from  a  3  design  if  each  side  of  the 
square  region  were  increased  by  less  than  1%.  Equivalently,  the  area  of  the 
region  would  be  increased  by  less  than  2%.  Using  the  scaling  that  was  used  in 
Figure  4  for  illustration,  we  should  have  to  change  the  temperature  by 
20.18  °C  instead  of  20  "C,  and  the  time  by  two  hours  and  one  minute  instead  of 


two  hours,  for  the  factorial  to  give  the  same  D  value  as  the  D/G-optimal 
design.  Obviously  no  experimenter  can  guess  to  anything  approaching  this 
accuracy  what  are  suitable  ranges  over  which  to  vary  these  factors. 

Obviously  choice  of  region  and  choice  of  information  function  are 
closely  interlinked.  For  example,  any  set  of  N  =  k+1  points  in  k-space  which 


have  no  coplanarities  is  obviously  a  D-optimal  first  order  design  for  some 
ellipsoidal  region.  Furthermore  the  information  function  for  a  design  of 
order  d  is  a  smooth  function  whose  harmonic  average  over  the  n  experimental 
points  (which  can  presumably  be  regarded  as  representative  of  the  region  of 


Namely  for  that  region  enclosed  within  the  information  contour  =  1 /p 
which  must  pass  through  all  the  k+1  experimental  points. 
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interest)  is  always  1/p  wherever  we  place  the  points.  Thus  the  prob’  m  of 
design  is  not  so  much  a  question  of  choosing  the  design  to  increase  total 
information  as  spreading  the  total  information  around  in  the  manner  desired. 

Rotatable  Designs 

A  route  for  simplication  different  from  alphabetic  optimality  occurs 

when,  after  suitable  transformation  of  the  inputs  X  to  standardized 

variables  x  nothing  is  known  about  the  orientation  in  the  X  space  of  the 

response  surface  we  wish  to  study.  It  was  argued  by  Box  and  Hunter  [11)  that 

we  should  then  employ  designs  having  the  property  that  the  variance  of  y  is 

1/o 

a  function  only  of  p  =  (X'X)'*  so  that 

V  =  V  and  1=1 
X  P  X  P 

For  a  first  order  design,  rotatability  implies  orthogonality  and  vice 
versa,  and  completely  decides  the  information  function.  For  second  and  higher 
order  designs,  a  requirement  of  rotatability  fixes  many  moment  properties  of 
the  design,  but  Vp  and  hence  I  are  still  to  some  extent  at  our  choice,  and 
can  be  changed  by  changing  certain  moment  ratios  [11].  In  particular,  for  a 
second  order  design,  Vp  depends  on  the  single  moment  ratio 
X  =  (n/3 )EX^/ (Lx^  )  ^  ,  For  illustration,  Figure  7  shows  the  information 
function  for  a  second  order  rotatable  design  with  X  =  .75  consisting  of  8 
points  arranged  in  a  regular  octagon  with  4  points  at  the  center. 

The  truth  seems  to  be  that  at  any  particular  phase  of  an  investigation 
the  scientific  decision  that  most  contributes  to  the  outcome  of  that  phase  is 
the  choice  of  the  current  region  of  interest  (involving  choice  of  variables, 
locations,  ranges,  and  transformations)  —  this  is  a  choice  that  does  not 
really  involve  statistics.  After  this  decision  is  made,  (and  given  the 
assumption  that  the  model  fits  perfectly  so  that  only  the  variance  properties 


-27- 


of  the  design  are  of  interest)  any  set  of  experiments  that  cover  this  region 
in  some  reasonably  uniform  way  is  likely  to  do  quite  well.  I  cannot  see  that 
the  various  optimality  criteria  are  particularly  relevant  to  this  choice, 
although  there  would  certainly  be  no  harm  in  considering  them,  together  with 
many  other  factors  briefly  discussed  later. 


Ignoring  of  bias 

All  models  are  wrong;  some  models  are  useful.  This  aphorism  is 
particularly  true  for  empirical  functions  such  as  polynomials  that  make  no 
claim  to  do  more  than  locally  graduate  the  true  function.  For  chemical 
examples  some  idea  of  the  adequacy  of  such  approximations  can  be  gained  by 

a 

studying  surfaces  produced  by  chemical  kinetic  models.  An  example”  taken  from 
[10]  is  shown  in  Figure  8.  See  also  [15]. 

One  conclusion  I  reached  from  many  such  studies  was  that  approximations 
would  not  need  to  be  very  good  for  response  surface  methods  to  work.  Thus 
within  region  A  of  Figure  8  the  locally  monotonic  function  could  be  crudely 
approximated  by  a  plane  which  could  indicate  a  useful  path  of  ascent.  Also 
valuable  information  might  be  obtained  about  a  ridge  such  as  that  in  region  B, 
even  though  the  underlying  surface  was  not  exactly  quadratic.  Notice  however 


a 

This  surface  was  generated  (see  [10]  for  details)  by  considering  the  yield 

kl  k2 

of  the  product  B  in  a  consecutive  reaction  A — »B — >C  following  first  order 
kinetics  with  temperature  sensitivity  given  by  the  Arrhenius  relation 
In  ki  =  In  +  Bj/T,  where  temperature  T  is  measured  in  degrees  Kelvin, 
using  plausible  values  for  the  constants  o^,  a2*  8lt  B2* 


REACTION  TIME  (HOURS) 


Figure  8.  Contours  of  a  theoretical  response  surface  in  reaction 

time  and  reaction  temperature  for  a  first  order  consecutive 
reaction,  with  plausible  values  substituted  for  kinetic 
constants. 
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that  in  the  light  of  such  examples  any  theory  of  experimental  design  which 
depended  on  the  exactness  of  such  approximations  should  be  regarded  with  some 
skepticism. 


5.  TAKING  ACCOUNT  OF  BIAS 


If  y  =  x'8 
its  total  error  e 


is  the  fitted  value  using  the  empirical  approximation, 
is 


then 


y  -  n  =  {y  -  E(y)}  +  (E(y)  -  n} 


Thus  the  error  e  contains  a  random  part  and  a  systematic,  or  bias,  part 

e  ,  and  we  must  expect  that  e  will  not  be  negligible.  Since  all  the  theory 
B  B  ■ 

previously  discussed  makes  the  assumption  that  e  is  zero,  we  must  consider 

B 

whether  the  resulting  designs  are  robust  to  this  kind  of  discrepancy.  The 
optimality  criteria  discussed  earlier  which  assume  the  response  function  to  be 
exact  usually  produce  a  substantial  proportion  of  experimental  points  on  the 
boundary  of  RO.  In  the  context  of  possible  bias,  this  is  not  reassuring, 
since  it  is  at  these  points  that  the  approximating  function  will  be  most 
strained. 

The  explicit  recognition  that  bias  will  certainly  be  present  does 
however  seem  to  provide  a  more  rational  means  for  approaching  the  scaling 
problem  (16),  [7)).  To  see  this,  consider  again  the  formulation  given  earlier 


in  terms  of  a  region  of  interest  R  and  a  larger  region  0  of  operability.  If 
we  were  to  assume  (unrealistically)  that  the  approximation  remained  exact 


however  widely  the  points  were  spread,  and  if  some  measure  of  variance 
reduction  were  the  only  consideration,  then  to  obtain  most  accurate  estimation 
within  R,  the  size  of  the  design  would  have  to  be  increased  to  the  boundaries 
of  the  operability  region  0.  But  in  fact  of  course  the  wider  the  points  were 
spread,  the  less  applicable  would  be  the  approximating  function,  and  the 
bigger  the  bias  error.  This  suggests  that  we  should  seek  restriction  of  the 
spread  of  the  experimental  points  not  by  artificial  limitation  to  some  region 
RO,  but  by  balancing  off  the  competing  requirements  of  variance  on  the  one 
hand,  which  is  reduced  as  the  spread  of  the  points  is  increased,  and  bias  on 
the  other  hand,  which  is  increased  as  the  spread  of  the  points  is  increased. 

A 

The  mean  square  error  associated  with  estimating  by  y^ 

2 

standardized  for  the  number,  n,  of  design  points  and  the  error  variance  o  , 
can  be  written  as  the  sum  of  a  variance  component  and  a  squared  bias  component 

n*E(yx  ~  rV2^°2  *  n*v(yx)/o2  +  n{E(y^)  -  n^}2/o2  , 


or 

\  vx  +  V 

For  illustration,  an  example  is  taken  from  a  forthcoming  book  with  N.R. 
Draper  and  J.S.  Hunter  1 10].  Figure  9  shows  a  situation  as  it  might  exist  for 
a  single  variable  when  a  straight  line  approximating  function  is  to  be  used. 
The  diagram  shows  what  might  be  the  true  underlying  function  which  would  of 
course  be  obscured  by  experimental  error.  Suppose  the  region  of  interest  R 
is  scaled  so  that  -x0  <  K  <  x0  and  in  particular  consider  the  two  designs 

(a)  (-2/3,  0,  2/3)  and  (b)  (-4/3,  0,  4/3). 


One  way  [6]  to  obtain  overall  measures  of  variance  and  squared  bias 
over  any  specified  region  of  interest  R  is  by  averaging  and  over 
R  to  provide  the  quantities 
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Figure  9.  Two  possible  designs  for 
over  a  region  of  interest 
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V  =  ^RVKd~^Rd~  and  B  =  * 

Denoting  the  integrated  (over  R)  mean  square  error  by  M,  we  can  then  write 

M  =  V  +  B. 


For  the  previous  example,  V,  B,  and  M  are  plotted  against  Xq  in  Figure  10. 

We  see  how  V  becomes  very  large  if  the  spread  of  the  design  is  made  very 
small,  while  if  the  design  is  made  very  large,  V  slowly  approaches  its  minimum 
value  of  unity.  The  average  squared  bias  B,  on  the  other  hand,  has  a  minimum 
value  when  x0  is  about  0.7,  and  increases  for  larger  or  smaller  designs.  A 
rather  flat  minimum  for  M  =  V  +  B  occurs  near  Xq  =  0.79.  Thus  in  this  manner 
the  design  which  minimizes  average  mean  squared  error  M  is  not  very  different 
from  the  design  which  minimizes  average  squared  bias  B,  but  extremely 
different  from  that  which  minimizes  average  variance  V. 

Choice  of  alternative  model 

A  difficulty  in  all  this  is  that  in  practice  we  do  not  know  the  nature 
of  the  true  function  hx.  Progress  may  be  made  however  by  supposing  that 
nx  is  to  some  satisfactory  approximation  represented  by  a  polynomial  model  of 
higher  degree  d2«  Suppose  then  that  a  polynomial  model  of  degree  d1  is  fitted 
to  n  data  values  to  give 

*x  =  *1*1 

while  the  true  model  is  in  fact  a  polynomial  of  degree  d 2,  so  that 


♦  he.2 


We  also  need  to  know  something  about  the  relative  magnitudes  of 
systematic  and  random  errors  that  we  could  expect  to  meet  in  practical 
cases.  It  was  argued  in  [6]  that  an  investigator  might  typically  employ  a 
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fitted  approximating  function  such  as  a  straight  line  when  he  believed  that 
the  average  departure  from  the  truth  induced  by  this  approximating  function 
was  no  worse  than  that  induced  by  the  process  of  fitting.  This  would  suggest 
that  the  experimenter  would  tend  to  choose  the  size  of  his  region  R,  and  the 
degree  of  his  approximating  function  in  such  a  way  that  the  integrated  random 
error  and  the  integrated  systematic  error  were  about  equal.  Thus  we  might 
suppose  that  a  situation  of  particular  interest  is  that  where  B  is  roughly 
equal  to  V.  Examples  that  we  studied  seemed  to  show  that  designs  that 
minimized  M  with  the  constraint  V  *  B  were  close  to  those  which  minimized  B. 
Consequently  we  suggested  that,  if  a  simplification  were  to  be  made  in  the 
design  problem,  it  might  almost  be  better  to  ignore  the  effects  of  sampling 
variation  rather  than  those  of  bias. 

However  this  may  be,  there  seems  no  doubt  that,  in  making  a  table  of 
useful  designs,  a  component  in  our  thinking  should  be  the  characteristics  of 
the  designs  which  minimized  squared  bias  against  feared  alternatives.  As  a 
factor  in  our  final  choice,  this  should  certainly  receive  as  much  attention  as 
the  indications  supplied  by,  say,  D-optimality. 

For  illustration  particular  examples  of  designs  in  three  dimensions 
which  minimize  integrated  squared  bias  when  R  is  a  sphere  of  unit  radius  are 
shown  in  Figure  11(a)  for  d1  =  1  and  d2  =  2  (a  first  order  design  robust  to 
second  order  effects)  and  in  Figure  11(b)  for  d1  =  2  and  d2  =  3  (a  second 

3 

order  design  robust  to  third  order  effects).  The  former  is  the  familiar  2 
factorial  scaled  so  that  the  points  are  0.71  units  from  the  center.  The 
latter  is  a  rotatable  composite  design  with  "cube"  points  at  a  distance  0.86 
from  the  center,  and  "star"  points  at  a  distance  0.83  from  the  center. 


Figure  11(a)  A  first  order  (two-level  factorial) 

design  in  three  factors  which  minimizes 
squared  bias  from  second  order  terms 
when  the  region  of  interest  is  a  sphere 
of  unit  radius. 


Figure  11(b)  A  second  order  composite  rotatable  design  which 
minimizes  squared  bias  from  third  order  terms 
when  the  weight  function  is  uniform  over  a 
spherical  region  of  interest  of  unit  radius. 


Two  possible  weight  functions  for  k  •  1: 

(a)  "Uniform  over  R"  type  indicating  uniform 
interest  over  R,  no  interest  outside  Rj 

(b)  Normal  Distribution  shape,  giving  greater 
weight  to  points  nearer  P. 
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Obviously  in  practice  because  of  the  inevitable  inexactness  of 
choosing  scales  exact  dimensions  of  the  designs  should  not  be  taken  too 
seriously,  but  these  examples  illustrate  the  fact  that  as  soon  as  we  take 
account  of  bias,  design  points  are  not  chosen  on  the  boundary  of  R. 

Choice  of  designs  which  minimize  bias 

Before  considering  the  problem  of  choosing  minimum  bias  designs  it  is 
desirable  to  generalize  slightly  the  previous  formulation.  Although  it  avoids 
limiting  the  location  of  the  design  points  in  an  artificial  way  the  idea  of  a 
region  of  interest  R  within  a  larger  operability  region  O  is  still  not 
entirely  satisfactory  because  it  implies  that  we  have  equal  interest  at  all 
points  within  R.  A  more  general  formulation  [7]  which  subsumes  that  we  have 
been  discussing  employs  a  weight  function  w(x)  which  extends  over  the 
operability  region  O  so  that  /Qw(x)dx  =  1.  The  weighted  mean  square  error  M 
can  now  be  split  into  a  weighted  variance  part  V  and  a  weighted  squared  bias 
part  B  so  that  again  M  =  V  +  B,  with 

M  =  /Qw(x)E{y(x)  -  H(x>}  dx 

A  A  « 

V  =  /Qw( X)E{y(X)  -  E(y(x))}  dx 

A  q 

B  =  /0w(X){E(y(X)  -  n(x)}  dx  . 

Two  possible  weight  functions  for  k  *  1  [20]  are  shown  in  Figure  12. 

Suppose  as  before  the  fitted  function  is  a  polynomial  °f 

degree  d1  while  the  true  model  is  a  polynomial  X^  +  of  de9ree  d2 

and  define  moment  matrices  for  the  design  and  for  the  weight  function  by 


Hu  = /0w{5)^i5id^' 


.12 


=  ZoWtxJx^’dX. 


Then  16]  a  necessary  and  sufficient  condition  for  the  squared  bias  B  to  be 
minimized  is  that 


-11-12 
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and  hence  a  sufficient  condition  is  that  all  the  moments  of  the  design  up  to 
and  including  order  d1  +  d2»  are  equal  to  all  the  corresponding  moments  of  the 
weight  function. 


6.  SOME  OTHER  CONSIDERATIONS  IN  DESIGN  CHOICE 

There  is  insufficient  space  to  discuss  here  all  of  the  items  in  Table  1 
that,  in  one  circumstance  or  another,  it  might  be  necessary  to  take  into 
account,  but  mention  will  be  made  of  a  few. 

Lack  of  Fit  (iii).  Sequential  Assembly  (vi).  Blocking  (v).  Estimation 
of  Error  (vii).  Transformation  Estimation  (iv) 


While  the  adequacy  of  a  particular  approximating  function  to  explore  a 
region  of  current  interest  is  always  to  some  extent  a  matter  of  guesswork, 
simple  approximations  requiring  fewer  runs  for  their  elucidation  will  usually 
be  preferred  to  more  complicated  ones.  This  leads  to  a  strategy  of  building 


up  from  simpler  models,  rather  than  down  from  more  complicated  ones.  A 
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practical  procedure  is  then:  to  employ  the  simplest  approximating  func' ion 
which  it  is  hoped  may  be  adequate;  to  allow  for  checking  its  adequacy  of  fit 
(see  also  [1],  [21,  [61,  and  [191);  to  switch  to  a  more  elaborate 
approximating  function  when  this  appears  necessary.  The  implication  for 
designs  is  (a)  that  they  should  provide  for  checking  model  adequacy,  (b)  that 
they  should  be  capable  of  sequential  assembly  —  a  design  of  order  d  should  be 
augmentable  to  one  of  order  d  +  1,  (c)  since  conditions  may  change  slightly 
from  one  set  of  runs  to  another,  especially  affecting  level,  the  pieces  of  the 
design  should  form  orthogonal  blocks. 

For  illustration.  Figure  13  shows  the  sequential  assembly  of  a  design 
arranged  in  three  orthogonal  blocks,  each  of  six  runs,  labeled  I,  II,  and  III. 
Block  I  is  a  first  order  design  but  also  provides  a  check  for  overall 
curvature  (obtained  by  contrasting  the  average  response  of  the  center  points 
with  the  average  response  on  the  cube).  A  single  contrast  of  the  center 
response  is  available  as  a  gross  check  on  previous  information  about 
experimental  error.  If  after  analyzing  the  results  from  Block  I  there  are 
doubts  about  the  adequacy  of  a  first  degree  polynomial  model.  Block  II  may  be 
performed.  It  uses  the  complementary  simplex,  and  the  two  parts  together  form 
a  first  order  design  (I+II)  with  much  greater  ability  to  detect  lack  of  fit 
due  to  second  order  terms  provided  by  additional  orthogonal  contrasts 
estimating  the  two-factor  interactions.  The  addition  of  Block  III  produces  a 
composite  design  (I+II+III)  which  allows  a  full  second  degree  approximating 
equation  to  be  fitted  if  this  appears  to  be  desirable.  The  complete  design 
also  provides  orthogonal  checking  contrasts  for  lack  of  quadraticity  in  each 
of  the  three  directions  ([9],  [12]).  These  contrasts  can  also  be  regarded  as 
checking  the  need  for  transformation  in  each  of  the  X's.  Finally  if  it  were 


example  of  sequential  assembly,  showing  checks  of  linearity  and 
idraticity. 
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dec  ided  that  more  information  about  experimental  error  was  desirable,  the 
replication  of  the  star  in  a  further  Block  IV  could  furnish  this,  and  also 
provide  some  increase  in  the  robustness  of  the  design  to  wild  observations. 

Robustness 


Approaches  to  the  robust  design  of  experiments  have  been  recently 

reviewed  by  Herzberg  [28];  see  also  [29].  In  particular.  Box  and  Draper  [8] 

suggested  that  the  effects  of  wild  observations  could  be  minimized  by  making 

r  *  Er  small,  where  R  =  (r  }  ■  x(x'x)  X'.  This  is  equivalent  to 
uu  tu  -  -  - 

2  2  A 

minimizing  Er^  -  p  /n  =  Var{V(y)}  which  takes  the  value  zero  when 
* 

V(yu)  ■  p/n  (u  =  1,2,. ..,n).  Thus  G-optimal  designs  are  optimally  robust  in 
this  sense. 

Size  of  the  experimental  design 

A  good  experimental  design  is  one  which  focuses  experimental  effort  on 

what  is  judged  important  in  the  particular  current  experimental  context. 

Suppose  that,  in  addition  to  estimating  the  p  parameters  of  the  assumed 

model  form,  it  is  concluded  that  f  >  0  contrasts  are  needed  to  check 

adequacy  of  fit,  b  >  0  further  contrasts  for  blocking,  and  that  an  estimate 

of  experimental  error  is  needed  having  e  >  0  degrees  of  freedom.  To  obtain 

independent  estimates  of  all  items  of  interest  we  then  require  a  design 

containing  at  least  p  +  f  +  b  +  e  runs.  However  the  importance  of  checking 

fit,  blocking,  and  obtaining  an  independent  estimate  of  error  will  differ  in 

different  circumstances,  and  the  minimum  value  of  n  will  thus  correspondingly 

2 

differ.  But  this  minimum  design  will  in  any  case  only  be  adequate  if  0  is 


2 

below  some  critical  value.  When  o  is  larger  designs  larger  than  the 


minimum  design  will  be  needed  to  obtain  estimates  of  sufficient  precision.  In 

this  circumstance  rather  than  merely  replicate  the  minimum  design,  opportunity 

may  be  taken  to  employ  a  higher  order  design  allowing  the  fitting  of  a  more 

elaborate  approximating  function  which  can  then  cover  a  wider  experimental 

2 

region.  Notice  that  even  when  o  is  small  designs  for  which  n  is  larger 
than  p  are  not  necessarily  wasteful.  This  depends  on  whether  the  additional 
degrees  of  freedom  are  genuinely  used  to  achieve  the  experimenter's  current 
objectives. 

Simple  Data  Patterns 

It  has  sometimes  been  argued  that  we  may  as  well  choose  points  randomly 
to  cover  the  "design  region"  or  employ  some  algorithm  that  distributes  them 
evenly  even  though  this  does  not  result  in  a  simple  data  pattern  such  as  is 
achieved  by  factorials  and  composite  response  surface  designs.  In  favor  of 
this  idea  it  has  been  urged  that  the  fitting  of  a  function  by  least  squares  to 
a  haphazard  set  of  points  is  no  longer  a  problem  for  modern  computational 
devices.  This  is  true,  but  overlooks  an  important  attribute  of  designs  which 
form  simple  patterns.  The  statistician's  task  as  a  member  of  a  scientific 
team  is  a  dual  one,  involving  inductive  criticism  and  deductive  estimation. 

The  latter  involves  deducing  in  the  light  of  the  data  the  consequences  of 
given  assumptions  (estimating  the  fitted  function),  and  this  can  certainly  be 
done  with  haphazard  designs.  But  the  former  involves  the  question  (a)  of  what 
function  should  be  fitted  in  the  first  place,  and  (b)  of  how  to  examine 
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residuals  from  the  fitted  function  in  an  attempt  to  understand  deviations  from 
the  initial  model,  in  particular  in  relation  to  the  independent  variables,  and 
so  to  be  led  to  appropriate  model  modification. 

Designs  such  as  factorials  and  composite  response  surface  designs 
employ  patterns  of  experimental  points  that  allow  many  such  comparisons  to  be 
made,  both  for  the  original  observations  and  for  the  residuals  from  any  fitted 
function.  For  example,  consider  a  3  factorial  design  used  to  elucidate  the 
effects  of  temperature  and  concentration  on  some  response  such  as  yield. 
Intelligent  inductive  criticism  is  greatly  enhanced  by  the  possibility  of 
being  able  to  plot  the  original  data  and  residuals  against  temperature  for 
each  individual  level  of  concentration,  and  against  concentration  for  each 
individual  level  of  temperature. 


7.  CONCLUSION 


(i)  We  must  look  for  good  design  criteria  which  measure 
characteristics  of  the  experimental  arrangement  in  which  the  scientist  might 
sensibly  be  interested.  Because  the  importance  of  various  characteristics 
will  differ  in  different  situations,  tables  of  such  criteria  for  particular 
designs  would  encourage  good  judgment  to  be  used  in  matching  the  design  to  the 
scientific  context.  Optimum  levels  of  these  criteria  can  be  useful  as  bench 
marks  in  judging  the  efficiencies  of  a  particular  design  with  respect  to  these 
various  criteria. 

(ii)  However  good  designs  must  in  practice  be  good  compromises,  and  it 
is  doubtful  how  useful  single  criterion  optimal  designs  are  in  locating  such 
compromises.  An  optimal  design  is  represented  by  a  point  in  the  multi¬ 
dimensional  space  of  the  coordinates  of  the  design  and  a  series  of  different 


5 


criteria  will  give  a  series  of  such  extremal  points  which  can  be  very 
differently  located.  Obviously  knowledge  of  the  location  of  such  extrema  may 
tell  us  almost  nothing  about  the  location  of  good  compromises.  For  this  we 
would  need  to  study  the  joint  behaviour  of  the  criterion  functions  at  levels 
close  to  their  extremal  values.  One  limited  but  useful  step  would  be  to 
further  investigate  which  criteria  are  in  accord,  (such  as  G-optimality  and 
robustness  to  wild  observations)  and  which  in  conflict  (such  as  variance  and 
bias ). 

(iii)  It  is  true  that  the  problem  of  experimental  design  is  full  of 
scientific  arbitrariness  —  no  two  investigators  would  choose  the  same 
variables,  start  their  experiments  in  the  same  place,  change  variables  over 
the  same  regions,  and  so  on  —  but  science  works  not  by  uniqueness  but  by 
employing  iterative  techniques  which  tend  to  converge.  Clearly  we  must  learn 
to  live  with  scientific  arbitrariness,  or  else  we  are  in  a  world  of  make 
believe.  But  we  can  make  the  problems  worse,  not  better,  by  introducing 
arbitrariness  for  purely  mathematical  reasons. 
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